NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Truthfulness of Decision-Theoretic Calibration Measures

Qiao, Mingda; Zhao, Eric (June 2025, 38th Annual Conference on Learning Theory (COLT 2025))

Calibration measures quantify how much a forecaster’s predictions violate calibration, which requires that forecasts are unbiased conditioning on the forecasted probabilities. Two important desiderata for a calibration measure are its decision-theoretic implications (i.e., downstream decision-makers that best respond to the forecasts are always no-regret) and its truthfulness (i.e., a forecaster approximately minimizes error by always reporting the true probabilities). Existing measures satisfy at most one of the properties, but not both. We introduce a new calibration measure termed subsampled step calibration, StepCEsub, that is both decision-theoretic and truthful. In particular, on any product distribution, StepCEsub is truthful up to an O(1) factor whereas prior decision-theoretic calibration measures suffer from an e−Ω(T)–Ω(T−−√) truthfulness gap. Moreover, in any smoothed setting where the conditional probability of each event is perturbed by a noise of magnitude c>0, StepCEsub is truthful up to an O(log(1/c)−−−−−−−√) factor, while prior decision-theoretic measures have an e−Ω(T)–Ω(T1/3) truthfulness gap. We also prove a general impossibility result for truthful decision-theoretic forecasting: any complete and decision-theoretic calibration measure must be discontinuous and non-truthful in the non-smoothed setting.
more » « less
Full Text Available
Learning With Multi-Group Guarantees For Clusterable Subpopulations

Dai, Jessica Dai; Haghtalab, Nika; Zhao, Eric (July 2025, Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:12084-12105, 2025.)

Full Text Available
From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning

Zhao, Eric; Awasthi, Pranjal; Haghtalab, Nika (December 2024, NeurIPS 2025: Proceedings of the 38th Annual Conference on Neural Information Processing Systems)

Full Text Available
Truthfulness of Calibration Measures

Haghtalab, Nika; Qiao, Mingda; Yang, Kunhe; Zhao, Eric (December 2024, Advances in Neural Information Processing Systems)

We study calibration measures in a sequential prediction setup. In addition to rewarding accurate predictions (completeness) and penalizing incorrect ones (soundness), an important desideratum of calibration measures is truthfulness, a minimal condition for the forecaster not to be incentivized to exploit the system. Formally, a calibration measure is truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. We conduct a taxonomy of existing calibration measures. Perhaps surprisingly, all of them are far from being truthful. We introduce a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE), which is complete and sound, and under which truthful prediction is optimal up to a constant multiplicative factor. In contrast, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty.
more » « less
Full Text Available
Truthfulness of Calibration Measures

Haghtalab, Nika; Qiao, Mingda; Yang, Kunhe; Zhao, Eric (December 2024, Advances in Neural Information Processing Systems)

We study calibration measures in a sequential prediction setup. In addition to rewarding accurate predictions (completeness) and penalizing incorrect ones (soundness), an important desideratum of calibration measures is truthfulness, a minimal condition for the forecaster not to be incentivized to exploit the system. Formally, a calibration measure is truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. We conduct a taxonomy of existing calibration measures. Perhaps surprisingly, all of them are far from being truthful. We introduce a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE), which is complete and sound, and under which truthful prediction is optimal up to a constant multiplicative factor. In contrast, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty.
more » « less
Full Text Available
Grove: A Bidirectionally Typed Collaborative Structure Editor Calculus

https://doi.org/10.1145/3704909

Adams, Michael D; Griffis, Eric; Porter, Thomas J; Satish, Sundara Vishnu; Zhao, Eric; Omar, Cyrus (January 2025, Proceedings of the ACM on Programming Languages)

Version control systems typically rely on apatch language, heuristicpatch synthesis algorithmslike diff, andthree-way merge algorithms. Standard patch languages and merge algorithms often fail to identify conflicts correctly when there are multiple edits to one line of code or code is relocated. This paper introduces Grove, a collaborative structure editor calculus that eliminates patch synthesis and three-way merge algorithms entirely. Instead, patches are derived directly from the log of the developer’s edit actions and all edits commute, i.e. the repository state forms a commutative replicated data type (CmRDT). To handle conflicts that can arise due to code relocation, the core datatype in Grove is a labeled directed multi-graph with uniquely identified vertices and edges. All edits amount to edge insertion and deletion, with deletion being permanent. To support tree-based editing, we define a decomposition from graphs intogroves, which are a set of syntax trees with conflicts—including local, relocation, and unicyclic relocation conflicts—represented explicitly using holes and references between trees. Finally, we define a type error localization system for groves that enjoys atotalityproperty, i.e. all editor states in Grove are statically meaningful, so developers can use standard editor services while working to resolve these explicitly represented conflicts. The static semantics is defined as a bidirectional marking system in line with recent work, with gradual typing employed to handle situations where errors and conflicts prevent type determination. We then layer on a unification-based type inference system to opportunistically fill type holes and fail gracefully when no solution exists. We mechanize the metatheory of Grove using the Agda theorem prover. We implement these ideas as theGrove Workbench, which generates the necessary data structures and algorithms in OCaml given a syntax tree specification.
more » « less
Full Text Available
A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning

Haghtalab, Nika; Jordan, Michael; Zhao, Eric (December 2023, Advances in Neural Information Processing Systems 36 (NeurIPS 2023))

We provide a unifying framework for the design and analysis of multi-calibrated predictors. By placing the multi-calibration problem in the general setting of multi-objective learning---where learning guarantees must hold simultaneously over a set of distributions and loss functions---we exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multi-calibration learning problems. In addition to shedding light on existing multi-calibration guarantees and greatly simplifying their analysis, our approach also yields improved guarantees, such as error tolerances that scale with the square-root of group size versus the constant tolerances guaranteed by prior works, and improving the complexity of k-class multi-calibration by an exponential factor of k versus Gopalan et al.. Beyond multi-calibration, we use these game dynamics to address emerging considerations in the study of group fairness and multi-distribution learning.
more » « less
A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning

Haghtalab, Nika; Jordan, Michael; Zhao, Eric (December 2023, Advances in Neural Information Processing Systems 36 (NeurIPS 2023))

We provide a unifying framework for the design and analysis of multi-calibrated predictors. By placing the multi-calibration problem in the general setting of multi-objective learning---where learning guarantees must hold simultaneously over a set of distributions and loss functions---we exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multi-calibration learning problems. In addition to shedding light on existing multi-calibration guarantees and greatly simplifying their analysis, our approach also yields improved guarantees, such as error tolerances that scale with the square-root of group size versus the constant tolerances guaranteed by prior works, and improving the complexity of k-class multi-calibration by an exponential factor of k versus Gopalan et al.. Beyond multi-calibration, we use these game dynamics to address emerging considerations in the study of group fairness and multi-distribution learning.
more » « less
Full Text Available
Total Type Error Localization and Recovery with Holes

https://doi.org/10.1145/3632910

Zhao, Eric; Maroof, Raef; Dukkipati, Anand; Blinn, Andrew; Pan, Zhiyi; Omar, Cyrus (January 2024, Proceedings of the ACM on Programming Languages)

Type systems typically only define the conditions under which an expression is well-typed, leaving ill-typed expressions formally meaningless. This approach is insufficient as the basis for language servers driving modern programming environments, which are expected to recover from simultaneously localized errors and continue to provide a variety of downstream semantic services. This paper addresses this problem, contributing the first comprehensive formal account of total type error localization and recovery: the marked lambda calculus. In particular, we define a gradual type system for expressions with marked errors, which operate as non-empty holes, together with a total procedure for marking arbitrary unmarked expressions. We mechanize the metatheory of the marked lambda calculus in Agda and implement it, scaled up, as the new basis for Hazel, a full-scale live functional programming environment with, uniquely, no meaningless editor states. The marked lambda calculus is bidirectionally typed, so localization decisions are systematically predictable based on a local flow of typing information. Constraint-based type inference can bring more distant information to bear in discovering inconsistencies but this notoriously complicates error localization. We approach this problem by deploying constraint solving as a type-hole-filling layer atop this gradual bidirectionally typed core. Errors arising from inconsistent unification constraints are localized exclusively to type and expression holes, i.e. the system identifies unfillable holes using a system of traced provenances, rather than localized in an ad hoc manner to particular expressions. The user can then interactively shift these errors to particular downstream expressions by selecting from suggested partially consistent type hole fillings, which returns control back to the bidirectional system. We implement this type hole inference system in Hazel.
more » « less
Full Text Available
Open Problem: The Sample Complexity of Multi-Distribution Learning for VC Classes

Awasthi, Pranjal Awasthi; Haghtalab, Nika; Zhao, ERic (September 2023, Proceedings of Machine Learning Research)
Neu, Gergely; Rosasco, Lorenzo (Ed.)
Full Text Available

« Prev Next »

Search for: All records